VISTA: Validating and Refining Clusters via Visualization (final version)
نویسندگان
چکیده
Clustering is an important technique for understanding of large multi-dimensional datasets. Most of clustering research to date has been focused on developing automatic clustering algorithms and cluster validation methods. The automatic algorithms are known to work well in dealing with clusters of regular shapes, e.g. compact spherical shapes, but may incur higher error rates when dealing with arbitrarily shaped clusters. Although some efforts have been devoted to addressing the problem of skewed datasets, the problem of handling clusters with irregular shapes is still in its infancy, especially in terms of dimensionality of the datasets and the precision of the clustering results considered. Not surprisingly, the statistical indices works ineffective in validating clusters of irregular shapes, too. In this paper, we address the problem of clustering and validating arbitrarily shaped clusters with a visual framework (VISTA). The main idea of the VISTA approach is to capitalize on the power of visualization and interactive feedbacks to encourage domain experts to participate in the clustering revision and clustering validation process. The VISTA system has two unique features. First, it implements a linear and reliable visualization model to interactively visualize multi-dimensional datasets in a 2D star-coordinate space. Second, it provides a rich set of user-friendly interactive rendering operations, allowing users to validate and refine the cluster structure based on their visual experience as well as their domain knowledge.
منابع مشابه
VISTA: validating and refining clusters via visualization
Clustering is an important technique for understanding of large multi-dimensional datasets. Most of clustering research to date has been focused on developing automatic clustering algorithms and cluster validation methods. The automatic algorithms are known to work well in dealing with clusters of regular shapes, e.g. compact spherical shapes, but may incur higher error rates when dealing with ...
متن کاملValidating and Refining Clusters via Visual Rendering
Clustering is an important technique for understanding and analysis of large multi-dimensional datasets in many scientific applications. Most of clustering research to date has been focused on developing automatic clustering algorithms or cluster validation methods. The automatic algorithms are known to work well in dealing with clusters of regular shapes, e.g. compact spherical shapes, but may...
متن کاملOptimizing star-coordinate visualization models for effective interactive cluster exploration on big data
Interactive visual cluster analysis is the most intuitive way for finding clustering patterns, validating algorithmic clustering results, understanding data clusters with domain knowledge, and refining cluster definitions. The most challenging step is visualizing multidimensional data and allowing a user to interactively explore the data to identify clustering structures. In this paper, we syst...
متن کاملThe L.l. Thurstone Psychometric Laboratory University of North Carolina
This chapter presents ViSta-PrnCmp, the module for Principal Components Analysis (PCA) in ViSta. This procedure is capable of analyzing numerical variables so they can be represented on a lower dimensionality space. The visualization for ViSta-PrnCmp includes a scatterplot-matrix of component scores; a bi-dimensional biplot; a tri-dimensional (spin-plot) version of the biplot; a box-diamond-dot...
متن کاملVISTA Variables in the Via Lactea (VVV): The public ESO near-IR variability survey of the Milky Way
We describe the public ESO near-IR variability survey (VVV) scanning the Milky Way bulge and an adjacent section of the mid-plane where star formation activity is high. The survey will take 1929 hours of observations with the 4-metre VISTA telescope during five years (2010 − 2014), covering ∼ 109 point sources across an area of 520 deg2, including 33 known globular clusters and ∼350 open cluste...
متن کامل